Skip to main content

Singlestore

Overview

Singlestore is a cloud native database supporting a variety of interaction styles and data types. The query styles available are SQL or MongoDB. The former uses a MySQL protocol and the latter a MongoDB API protocol. For more information see https://www.singlestore.com/.

Qarbine supports native interactions for both of these styles and is built to handle complex data structures. For example, given the structure shown below

  

Qarbine can perform a query and then easily format a report such as the following.

  

There is no flattening of data such as is required by legacy tools. This report result can then be exported into various popular formats and easily shared within leading collaboration tools.

Defining a Data Source

Overview

A Data Source is a Qarbine component responsible for retrieving data from somewhere. At a high level it has a name, a description and some arbitrary query string which when sent to the associated Qarbine Data Service endpoint returns some data. The overall execution flow for an analysis, including the optional prompt component, is shown below.

  

A single data source can be referenced by name from multiple Qarbine template components. This enables a single point of change when perhaps, an index is added, or some other query tweak is necessary. The alternative is to attempt to find all templates impacted by a schema or index change for example. This component reusability is especially beneficial when team members have varying roles and skills.

SQL Query Language

For details on Singlestore SQL queries see https://docs.singlestore.com/cloud/reference/sql-reference/

Generally speaking Qarbine simply passes along the query as-is to the Singlestore server to execute and return the answer set.

Example with JSON Data

The example data is from the Singlestore tutorial found at https://docs.singlestore.com/cloud/create-a-database/using-json/

The general schema of this table is depicted below.

  

The data source specification below retrieves 25 orders.

select * from orders limit 25

A sample result element is shown below.

  

We can remove the order_doc containing field via this updated specification.

#pragma pullFieldsUp order_doc

select * from orders

A sample result element is shown below.

  

This provides a simpler way to access data items by removing the intermediate level order_doc field.

Example with Vector Data

The example data is from the Singlestore tutorial found at https://docs.singlestore.com/cloud/developer-resources/functional-extensions/working-with-vector-data/

A basic query is shown below.

SELECT id, comment, json_array_unpack(vector), category FROM comments

The results are shown below with the first row selected.

  

Remember that the vector argument in the query must match the model used to create the vectors in the data as well. The former is based on the AI Assistant as configured by the Qarbine administrator.

As noted on the Singlestore page you can also pass in a vector from your application.

SET @query_vec = json_array_pack('[ a query vector from your application ]');

In a Qarbine data source specification you can then use a variable placeholder. For example, the following can be used if the variable is a string value.

SET @query_vec = json_array_pack(@appDefinedVector);

If the variable is an array of numbers, then use the Qarbine block syntax which allows a macro expression as shown below.

SET @query_vec = json_array_pack( [! json(@appDefinedVector) !] );

The Qarbine json() call accepts the array of numbers and returns its JSON string representation which is wanted by the Singlestore json_array_pack function.

User Defined Variables

Singlestore allows multi-statement queries and also the use of user defined variables. A data source specification example is shown below.

// Using @@ because a @ indicates a reference to a Qarbine variable
SET @@query_vec = json_array_pack('[0.1, 0.16, 0.36, 0.05]');
SELECT id, dot_product(@@query_vec, vector) AS vec_score,
match(comment) against ("restaurant") AS ft_score,
(vec_score + ft_score) / 2 AS combined_score, comment
FROM comments
ORDER BY combined_score DESC
LIMIT 2

Sample results are shown below.

  

Example Nest JSON Data Setup

There is a simple set of data described on the bottom half of the documentation page at https://docs.singlestore.com/db/v7.5/create-a-database/using-json/

For quick reference, below are the SQL statements used to populate the sample data.

CREATE TABLE json_tab (`id` INT(11) DEFAULT NULL,`jsondata` JSON COLLATE utf8_bin);

INSERT INTO json_tab VALUES
( 8765 ,' {"city":"SFO","sports_teams":[{"sport_name":"football","teams": [{"club_name":"Raiders"},{"club_name":"49ers"}]},
{"sport_name":"baseball","teams" : [{"club_name":"As"},{"club_name":"SF Giants"}]}]}') ;

INSERT INTO json_tab VALUES
( 9876,'{"city":"NY","sports_teams" : [{ "sport_name":"football","teams" : [{ "club_name":"Jets"},{"club_name":"Giants"}]},
{"sport_name":"baseball","teams" : [ {"club_name":"Mets"},{"club_name":"Yankees"}]},
{"sport_name":"basketball","teams" : [{"club_name":"Nets"},{"club_name":"Knicks"}]}]}');

Notice the elements have 2 levels of embedded array fields and an embedded document.

When using SQL, the embedded arrays and documents need to be “flattened”. The query to do this is shown below.

  

The issue with this approach is that it explodes the answer set from 2 rows to 12 as shown below. The larger the embedded arrays the larger the expansion.

  

MongoDB Query Language

For details on Singlestore Kai MongoDB feature see https://docs.singlestore.com/cloud/reference/singlestore-kai/

The simplest data source specification you can use retrieves all of the objects in the json_tab example collection is shown below.

db.json_tab.find()

The results are shown below with the first row selected.

  

To simplify reviewing the results and also writing template formulas we can add a Qarbine pragma as shown below.

#pragma pullFieldsUp jsondata 
db.json_tab.find()

These results are shown below with the first row selected.

  

This is a fairly deep document structure. Legacy tools would have great difficulty traversing the document and performing analysis on it. The answer set explosion caused by flattening can quickly become untenable to transfer and even to make sense of. The MongoDB Kai approach is much better at handling such data shapes. Unlike legacy SQL oriented tools, Qarbine is built to handle this scenario.

Managing Answer Set Size

The default maximum number of rows starts off at 25 for a new data source. This is useful to evolve a query from a concept to one that you have verified returns the desired answer set. As noted, any native way of limiting an answer set size is the preferred approach. This setting is in the component dialog as shown below and also accessible by clicking the ‘Gear’ icon.

  

Once you are done drafting you can adjust this parameter. A “0” indicates there is no maximum. A number greater than 0 indicates to limit the final answer set size to that number of rows. This answer set truncation comes after any native query limit. So, if the answer set from the data endpoint is quite large, that content has to be returned to the Qarbine host. It then may truncate the number of rows. It is best to truncate at the query level (i.e., use a limit) to reduce the content sent from the data endpoint to the Qarbine host in the first place.

Adjusting the Maximum Rows

Recall the default maximum rows at the component level is 25. When you are satisfied with your query you can change that setting by clicking.

  

Adjust the setting to “0” indicating no Qarbine answer set truncation.

  

Click

  

Defining an Analysis Template

Overview

A template defines how to process the data being retrieved from Data Source queries and other data expressions. It also defines formulas, formatting options, and other analysis and presentation options. The overall execution flow for an analysis, including the optional prompt component, is shown below

  

Using the Template Designer

Shown below is example output from the city sports teams data in the json_tab collection.

  

The right hand side of the Template Designer will show any meta data about the data source data. (There must be no cell chosen in the grid area for this to appear).

The page header and report header sections are shown below with their respective cells.

  

The template is associated with the Data Source defined above. This is shown below.

  

. . .

  

. . .

  

As the main rows are iterated, they are set into the variable named ‘location’. Within the Template Designer the right hand side will show the general structure of the primary data source. For our example the geneal structure is shown below.

  

For each main element group 1.1 iterates through the sports_teams array and assigns that element into the variable named “sports_teamsElement”. Below is the group’s properties of interest.

  

  

  

For each sports_teamsElement value group 1.1.1 iterates through the teams array and assigns that element into the variable named “teamsElement”. Below is the group’s properties of interest.

  

  

  

In summary there is the main “location” loop which contains the @sports_temsElement” loop which in turn contains the “teamsElement” loop. In the Template Designer you can easily see this data looping on the right hand side using the drop down option shown below.

  

The template cells for these groups are shown below.

  

The ‘#’ prefix indicates a field name of the currently active element. The ‘@’ prefix indicates a variable name. The formula

=concat(@sports_teamsElement.sport_name, ' Teams')

uses dot notation to access the sport_name value of the sports_teamsElement variable. The count() formulas act similarly to their similar Excel functions. Qarbine has over 450 built-in functions to define various formulas.

Prompt Integration

Overview

Qarbine prompts provide a way to obtain runtime values and variables for data source and template execution. To avoid hardcoding, prompts can use macro formulas to run queries which populate list widgets. Prompts are defined in a no code manner using the Prompt Designer. A large variety of widgets are available from a simple entry field to a multi-select list populated from a data query. Shown below is the execution flow when there is a Prompt component.

  

The Prompt Designer supports a large variety of input widgets including entry fields, check boxes, radio button groups, sliders, and file input.

Example

For this tutorial we will prompt the user for which city to retrieve the sports team information. Below is the prompt that will be presented to the user.

  

The prompt has 2 elements.

  

The first element definition is shown below.

  

Notice the image URL can be a macro language expression and not just a simple string. The second element definition’s key properties are shown below.

     

  

This prompt component is saved with the name “Prompt for @city”.

A new Data Source is created “City sports team using MongoDB query for @city” with the specification of

#pragma pullFieldsUp jsondata 
db.json_tab.find( { "jsondata.city": @city} )

For tutorial simplicity we will save the current template using “Save as” with the name “City sports team using MongoDB query for @city”. The template’s properties are updated to refer to the new data source.

  

The prompt reference is shown below.

  

The template is saved and then ready to run.

Next Steps

Accessing Your Database

To configure access to your database see the guides at

Querying Your Database

For database specific interaction guides navigate to

References

General documentation can be found at https://docs.singlestore.com/.

For full text search information see
https://docs.singlestore.com/cloud/developer-resources/functional-extensions/working-with-full-text-search/

For vector search information see
https://www.singlestore.com/blog/the-power-of-sql-for-vector-database-operations-part-1/?utm_medium=social&utm_source=linkedin

For geospatial search information see
https://docs.singlestore.com/cloud/developer-resources/functional-extensions/working-with-geospatial-features/